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Abstract 

Army  logistical  systems  and  databases  contain  massive  amounts  of  data  that  require  effective 
methods  of  extracting  actionable  information  and  generating  knowledge.  Vehicle  diagnostics 
and  prognostics  can  be  challenging  to  analyze  from  the  Command  and  Control  (C2) 
perspective,  making  management  of  the  fleet  difficult  within  existing  systems.  Databases  do  not 
contain  root  causes  or  the  case-based  analyses  needed  to  diagnose  or  predict  breakdowns. 
21st  Century  Systems,  Inc.  previously  introduced  the  Agent-Enabled  Logistics  Enterprise 
Intelligence  System  (AELEIS)  to  assist  logistics  analysts  with  assessing  the  availability  and 
prognostics  of  assets  in  the  logistics  pipeline.  One  component  being  developed  within  AELEIS 
is  incorporation  of  the  Mahalanobis-Taguchi  System  (MTS)  to  assist  with  identification  of 
impending  fault  conditions  along  with  fault  identification.  This  paper  presents  an  analysis  into 
the  application  of  MTS  within  data  representing  a  known  vehicular  fault,  showing  how 
construction  of  the  Mahalanobis  Space  using  competing  methodologies  can  lead  to  reduced 
false  positives  while  still  capturing  true  positive  fault  conditions.  These  results  are  then 
discussed  within  the  larger  scope  of  AELEIS  and  the  resulting  C2  benefits. 
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Introduction 

During  Operation  Iraqi  Freedom,  Soldiers  of  the  56th  Infantry  Brigade  Combat  Team  providing 
escort  to  a  fuel  convoy  found  themselves  in  the  middle  of  the  Iraqi  desert  waiting  for  recovery 
vehicles  to  arrive  [1].  While  self-recovery  efforts  had  overcome  previous  situations  during  this 
mission,  a  broken-down  truck  eventually  halted  the  convoy  and  placed  friendly  forces  at  risk.  As 
one  soldier  pointed  out  “Whenever  we  can't  self  recover,  we  wait  for  additional  assets  to  get  to 
us.  Sometimes  that  wait  is  only  a  couple  of  hours  and  sometimes  it  is  longer.”  Fortunately,  all 
ended  well  and  the  fuel  was  delivered  safely.  However,  this  incident  illustrates  the  challenge 
and  ramifications  of  in  theater  fleet  asset  management. 

The  Army  is  working  toward  modernizing  its  fleet  logistics  [2]-[4],  yet  the  goal  of  an  effective 
decision  support  system  (DSS)  remains  elusive.  While  much  logistics  data  exists  in  dedicated 
databases,  there  are  limitations  to  providing  proactive  maintenance  type  information  to  avoid 
breakdowns.  Efficient  combat  support  and  combat  service  support  demands  look-ahead  for  fleet 
management  for  future  manned  and  unmanned  vehicles.  The  multitude  of  databases  each  has 
their  own  specific  function  and  never  provides  a  full  picture  for  the  fleet  managers,  operators, 
and  commanders  to  utilize.  Furthermore,  these  databases  do  not  do  an  adequate  job  of 
identifying  failure  modes  or  case-based,  root-cause  analysis. 

New  tools  are  needed  that  provide  up-to-date  information  mined  from  the  available  data  on  the 
vehicle  fleet  to  find  potential  problems.  An  effective  Enterprise  Intelligence  System  will  find  data 
from  as  many  sources  as  possible,  process  in  an  integrated  fashion,  and  disseminate 
information  products  on  the  readiness  of  the  fleet  vehicles.  Doing  so,  we  may  be  able  to  avoid 
such  scenarios  as  waiting  for  recovery  vehicles  in  a  combat  zone.  Making  sure  vehicles  stay 
operational  is  a  combination  of  predictive  health  maintenance  (condition  based  maintenance 
plus  root  cause  analysis)  and  parts  supply  availability.  Agent-Enabled  Logistics  Enterprise 
Intelligence  System  (AELEIS)  is  being  developed  as  a  tool  to  assist  logistics  analysts  with 
assessing  the  availability  and  prognostics  of  assets  in  the  logistics  pipeline  with  data  from 
multiple,  heterogeneous  sources.  Data  is  aggregated  and  mined  for  data  trends,  and  reasoning 
and  prognostics  tools  evaluate  the  data  for  relevance  and  potential  issues. 

Within  AELEIS,  we  are  developing  a  comprehensive  failure  mode  diagnosis  and  health 
condition  assessment  technique  for  vehicle  health  by  employing  the  Mahalanobis-Taguchi 
System  (MTS)  based  multi-parameter,  multi-input  pattern  recognition  methodology.  The  MTS 
analysis  provides  a  real-time,  continuous  monitoring  system  that  will  take  vehicle  history  data 
and  translate  it  into  a  probability  of  failure.  Data  acquired  on  vehicle  history  and  maintenance 
repair  will  be  mined  and  added  to  a  database  and  used  within  the  probability  of  failure 
calculations  and  revalidation  to  create  a  learning  system.  The  MTS  methodology  is  selected  due 
to  its  reported  accuracy  in  forecasting  trends  observed  in  correlated  data  sets  without  intensive 
computations  (thus  lower  cost)  [5]. 

This  paper  chronicles  some  of  the  challenges  experienced  attempting  to  extend  the  MTS 
approach  to  the  available  data  as  well  as  initial  results  from  modifying  MTS  to  these  vehicular 
data  sets.  We  believe  our  approach  takes  a  more  holistic  view  than  the  initial  strategy, 
accounting  for  the  impact  on  false  negatives  as  well  as  false  positives  within  the  resulting 
analyses.  By  examining  the  impact  of  applying  MTS  in  the  development  stages,  more 
meaningful  and  better  understanding  of  results  can  be  achieved.  This  paper  briefly  overviews 
the  AELEIS  concept  showing  where  MTS  fits  and  why  it  was  selected  over  alternative 
approaches.  A  more  detailed  discussion  of  MTS  is  presented,  followed  by  application  within  the 
vehicle  logistics  domain.  We  wrap  up  with  a  brief  summary  and  key  lessons  learned. 
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AELEIS  Concepts 

Figure  1  shows  a  conceptual  view  of  the  AELEIS  decision  tool.  Using  a  Service  Oriented 
Architecture  (SOA),  AELEIS  data  extraction  agents  connect  to  the  various  databases  and  other 
data  sources.  The  extraction  agents  scan  the  databases  for  key  pieces  of  data  and  then  publish 
that  data  back  to  the  AELEIS  Central  Core.  There,  the  reasoning  agents  determine  what  tools 
are  needed  based  on  data  clustering.  The  mining  and  trending  agents  find  the  information  in  the 
data  (they  may  also  instruct  the  extraction  agents  on  further  data  to  find).  Finally,  compiled 
actionable  information  is  published  in  a  standard  format  out  to  users.  This  information  is  then 
picked  up  by  a  decision  tool  allowing  the  user  to  see  the  information,  drill-down  to  see  where  the 
information  originated,  and  make  an  informed  decision  on  the  logistics  plan. 

Our  research  into  causal  data  mining  looked  into  Support  Vector  Machines  (SVM)  and  radial 
basis  function  neural  networks.  Both  of  these  methods  are  kernel  based  approaches,  however, 
they  are  self  organizing  during  training.  The  problem  that  we  encounter  with  this  type  of  method 
is  that  the  results  become  ambiguous.  Our  challenge  was  that  we  would  have  to  generate  much 
more  data  than  what  we  currently  had  in  order  to  adequately  train  these  algorithms.  We  needed 
an  algorithm  that  would  provide  clear  results  with  much  less  data.  Our  research  efforts  lead  us 
to  the  Mahalanobis-Taguchi  System.  MTS  provided  us  with  the  clear  diagnostics  capability  we 
were  looking  for,  while  needing  much  less  data.  Our  initial  results  showed  that  the  MTS 
algorithm  is  able  to  perform  the  diagnostics  task  with  representative  data.  We  used  this  as  a 
starting  point  to  more  fully  develop  the  MTS  algorithm.  In  future  work  we  plan  on  expanding  how 
MTS  can  also  find  causal  relationships  in  the  data. 
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Figure  1 :  Conceptual  view  of  the  AELEIS  decision  tool. 

Preliminary  AELEIS  development  relied  on  simulated  data  which  was  constructed  to  emulate 
degradation  of  performance  for  three  fault  modes  over  time.  We  have  since  been  able  to  obtain 
a  variety  of  government-furnished  data  which  we  have  used  to  more  fully  develop  AELEIS  (and 
the  MTS  approach)  as  outlined  here. 
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Mahalanobis  Distance 

Mahalanobis  Distance  (MD)  is  a  distance  measure  derived  from  an  analysis  of  the  deviation  in 
the  mean  values  of  different  variables  in  multivariate  analysis  considering  the  correlation 
between  the  variables.  As  a  discriminant  analysis  method,  MD  is  useful  in  determining  the 
similarity  of  a  set  of  values  from  an  unknown  sample  to  a  set  of  values  measured  from  a 
collection  of  known  samples.  MD  proves  to  be  superior  to  other  multidimensional  distance 
measures  due  to  the  following  [6]: 

•  Correlation  between  the  variables  is  used  in  its  calculation. 

•  It  is  very  sensitive  to  inter-variable  changes  in  the  reference  data. 

•  It  is  not  affected  by  the  dimensionality  of  the  dataset. 

Assuming  the  dataset  consists  of  k  variables;  /  is  the  variable  (/  =  1,2 . k);  n  represents  the 

number  of  samples  in  the  dataset;  and  j  is  the  sample  number  (y=  1,2 . n),  the  variables  are 

standardized  as  defined  in  Equation  (1). 

Zij  =  (.Xij -mi)lsi  (1) 

m,and  s,  represent  the  mean  and  standard  deviation  of  the  /th  variable,  respectively;  and  z^is 
the  standardized  vector  obtained  from  the  standardized  values  of  x^.  MD  values  are  calculated 
as  defined  in  Equation  (2). 


MDj  =  lz\jC-^Zij  (2) 

MDyis  the  Mahalanobis  distance  calculated  for  the  /th  case  and  C'  represents  the  inverse  of  the 
correlation  matrix  of  the  variables  in  the  dataset. 

Mahalanobis  Taguchi  System 

Genichi  Taguchi  applied  a  robust  engineering  methodology  using  Mahalanobis  distances  to 
develop  the  Mahalanobis-Taguchi  Strategy  (MTS)  as  a  diagnosis  and  forecasting  method  for 
multivariate  data.  It  is  a  pattern  recognition  technology  that  assists  in  quantitative  decision¬ 
making  by  constructing  a  multivariate  measurement  scale  using  data  analytic  procedures  with 
the  MD  values  [6],  [7].  MTS  can  be  used  to  develop  a  scale  to  measure  the  degree  of 
abnormality  of  data  measurements  compared  to  a  calculated  “normal”. 

Within  MTS,  initial  Mahalanobis  distances  are  calculated,  then  orthogonal  arrays  (OA)  and 
signal-to-noise  (S/N)  ratio  are  used  to  identify  attributes  of  importance.  Attributes  adding  only 
noise  and  not  signal  are  removed  from  the  process,  validating  against  known  abnormal 
conditions.  In  developing  a  multivariate  measurement  scale  it  is  important  to  (1)  have  a 
reference  point  to  the  scale,  (2)  validate  the  scale,  (3)  select  the  important  variables  adequate 
for  measuring  abnormality,  and  (4)  be  able  to  carry  out  future  diagnosis  with  the  measurement 
scale.  These  form  the  basis  of  MTS  application  with  the  steps  formalized  in  Figure  2. 


The  Mahalanobis-Taguchi  System  (MTS)  was  identified  to  work  the  diagnostics/prognostics 
challenge.  MTS  can  be  used  for  fault  detection,  isolation,  and  prognostics  [7]-[9].  Previously, 
we’ve  had  MTS  fuse  data  from  multiple  sensors  into  a  single  system  level  performance  metric 
using  Mahalanobis  Distance  (MD)  and  generate  fault  clusters  based  on  MD  values.  MD 
thresholds  derived  from  clustering  analysis  were  used  for  fault  detection  and  isolation.  Figure  3 
(a)  shows  a  conceptual  view  whereby  the  MD  (magnitude  and  angle)  can  help  detect  that  a  fault 
is  occurring  and  which  type  of  fault.  Figure  3  (b)  shows  the  same  concept  with  a  compound 
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Figure  2:  MTS  process  as  outlined  byTaguchi 

fault.  In  this  case,  either  Fault  1  or  Fault  2  may  be  indicated  by  the  MD.  In  particular,  a  change 
in  parameters  would  be  needed  to  properly  identify  the  fault.  By  creating  a  self-learning  scheme, 
the  proper  faults  can  be  identified,  and,  more  importantly,  which  parameters  to  use  to  separate 
the  faults.  The  manner  in  which  we  have  developed  the  initial  simulated  AELEIS  data,  we  were 
most  likely  to  see  the  compound  fault  situation  occur  in  the  Fuel  Injector  vs.  Fuel  Filter  fault 
conditions. 


(a)  (b) 


Figure  3:  Mahalanobis-Taguchi  System  (MTS)  where  the  Mahalanobis  distance  around  a 
fault  cluster  determines  the  variance  from  normal  (lower  left  corner)  for  simple  fault 
conditions  (a)  and  compound  fault  conditions  (b). 

Figure  4  shows  an  example  of  a  compound  fault  that  is  indistinguishable  using  only  outlet 
pressure  on  a  pump  [10].  The  MTS  method  holds  the  most  promise  for  the  classification  of  root 
cause  fault  analysis.  The  main  drawback  to  the  MTS  method  is  that  the  possible  root  causes 
must  be  known  a  priori.  Primarily,  this  is  due  to  the  training  and  placement  of  the  cluster  location 
within  the  Mahalanobis  space.  Other  methods  that  we  have  looked  into  for  root  cause  analysis 
include  Support  Vector  Machines  (SVM)  and  radial  basis  function  neural  networks.  Both  of 
these  methods  are  kernel  based  approaches,  however,  they  are  self  organizing  during  training. 
The  problem  that  we  encounter  with  this  type  of  method  is  that  the  results  become  ambiguous. 
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where  as  the  results  of  MTS  are  easily  understood  as  to  what  the  root  cause  is  and  even  allows 
for  predictive  analysis. 
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Figure  4:  Example  MD  based  fault  clusters  using  only  the  outlet  pressure  for  a  pump  [10]. 


Applying  the  MTS  Approach 

Concurrent  with  AELEIS  development,  the  Army  Tank-Automotive  Research  Development  & 
Engineering  Center  (TARDEC)  has  been  investigating  the  opportunities  of  capturing  individually 
identifiable  data  from  fleet  vehicles  for  health  and  maintenance  capabilities.  Portions  of  this  data 
have  been  made  available  for  research  within  AELEIS,  specifically  the  detection  and  prediction 
capabilities  offered  through  MTS.  Due  to  the  nature  of  the  data,  certain  details  cannot  be 
provided.  However,  general  details  are  provided  along  with  results. 

There  were  two  primary  data  sets  available  for  testing  and  development  we’ll  label  as  pre-  and 
post-launch.  The  pre-launch  data  was  collected  over  a  longer  time  period  (roughly  3  years)  and 
consisted  of  a  smaller  number  of  vehicles  (around  10).  Data  formatting  was  fairly  uniform  across 
the  entire  data  space  with  the  same  51  attributes  available  for  each  collection,  but  collected  at 
different  frequencies.  Data  collections  were  performed  for  each  vehicle  “run”  which  could  consist 
of  a  few  minutes  to  many  hours.  The  post-launch  data  had  been  collected  over  approximately 
one  year  with  a  much  larger  breadth  of  vehicles  (hundreds).  Data  formatting  was  often  not 
consistent  across  vehicles  (which  could  be  different  types)  and  could  be  inconsistent  within  a 
vehicles’  files  (for  example,  different  attribute  orderings  or  invalid  data  when  sensors  were  not 
operational  or  installed,  etc.).  Attributes  for  the  post-launch  vehicles  typically  ranged  from  120- 
150  per  vehicle. 

Fault  conditions  are  key  to  implementing  the  MTS  approach  as  they  drive  the  selection  of 
variables  used  in  the  final  MD  calculations.  The  pre-launch  vehicles  had  one  identified  fault 
condition  provided.  This  fault  was  the  first  initially  explored  and  is  the  one  which  will  be 
addressed  further  here.  The  post-launch  vehicles  had  over  100  documented  fault  (or  potential 
fault)  cases — two  of  which  were  examined  following  the  pre-launch  examination.  These  initial 
MTS  efforts  helped  hone  the  process  of  applying  a  modified  Mahalanobis  Taguchi  approach  on 
what  we  will  call  the  pre-launch  vehicle’s  fault  F. 

Previous  TARDEC  analysis  had  identified  differences  in  the  data  from  a  run  prior  and  following 
the  fixing  of  fault  F.  This  analysis  referenced  a  data  set  recorded  approximately  6  runs  prior,  and 
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another  set  approximately  6  runs  following  the  fix.  The  initial  MTS  analysis  focused  on  these  two 
runs,  attempting  to  best  compare  ‘apples  to  apples’  with  the  existing  analysis.  For  this  initial 
look,  the  most  relevant  subset  of  fields  was  selected  which  all  were  collected  at  the  same 
sampling  rate.  These  6  fields  were  analyzed,  one  was  used  to  divide  the  data  (following  the 
TARDEC  analysis  criteria)  and  another  was  removed  due  to  the  S/N  ratio,  leaving  4  factors. 
These  initial  4  factors  were  successful  in  discriminating  the  normal  and  test  cases  of  fault  F  as 
shown  in  Figure  5.  However,  applying  the  Mahalanobis  Space  to  other  runs  both  prior  and 
following  the  fix  did  not  provide  as  much  consistency  as  desired. 
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Figure  5:  MD  values  from  the  normal  (left)  and  abnormal  (right)  data  with  initial  4-factors 

The  post-launch  data  simplified  some  of  the  pre-launch  challenges  in  that  there  were  numerous 
faults  and  the  data  was  all  collected  at  the  same  fidelity,  eliminating  the  need  for  re-sampling. 
Two  instances  of  a  similar  fault  were  examined  starting  from  a  much  richer  set  of  attributes  than 
the  original  fault  F  scope.  This  post-launch  examination  took  30  variables  and  iteratively 
reduced  them  down  to  6.  Within  this  analysis,  the  removal  of  outliers  was  introduced  for 
construction  of  the  Mahalanobis  Spaces. 

Traditionally,  the  entire  normal  group  is  utilized  to  construct  the  unit  space  as  step  1  in  MTS. 
However,  with  the  amount  and  variability  of  data  (due  to  vehicles,  sensors,  etc.)  the  resulting 
Mahalanobis  Space  (MS)  from  the  vehicular  data  isn’t  necessarily  as  clean  as  in  existing  work. 
Therefore,  we  sought  mechanisms  to  remove  outliers  from  consideration  of  the  initial  MS 
construction.  Removing  outliers  constructs  a  “narrower”  space  which  leads  to  better  detection  of 
abnormal  conditions  (increasing  true  negatives  &  decreasing  false  negatives)  at  the  expense  of 
flagging  some  of  the  removed  normal  data  as  abnormal  (false  positives).  This  analysis  showed 
that  the  S/N  ratio  profiles  were  apt  to  change  as  more  aggressive  thresholds  were  implemented. 
The  result  was  increased  understanding  of  the  ramifications  of  variable  inclusion/exclusion 
within  the  scope  of  false  negative  and  false  positive  detection.  These  initial  investigations  have 
resulted  in  revisiting  fault  F  and  the  following  analysis  of  what  variables  seem  most  appropriate 
from  the  entire  perspective. 
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To  simplify  the  scope  of  examination,  pre-launch  data  was  normalized  to  the  same  frequency  as 
the  post-launch  data.  This  required  both  up-sampling  and  down-sampling  of  the  data, 
depending  on  attribute.  Some  of  the  fields  may  have  not  had  enough  variance  (e.g.,  if  the 
standard  deviation  was  0,  the  resultant  correlation  matrix  would  be  singular  and  no  inverse 
could  be  obtained)  and  others,  such  as  the  day  of  the  week,  were  irrelevant  or  redundant.  The 
preliminary  work  took  the  51  variables  down  to  16.  In  addition,  the  normal  data  was  constructed 
from  the  entire  6  runs  following  the  fix,  and  a  test  data  set  was  constructed  from  the  6  runs  prior 
to  the  fix.  The  single-run  test  data  was  also  retained  and  used  within  analyses. 

From  the  established  variable  set,  the  initial  MTS  data  analysis  step  was  performed,  comparing 
the  constructed  MS  against  the  abnormal  data.  Figure  6  shows  the  result  utilizing  the  entire  6- 
run  dataset  for  the  abnormal  condition.  There  is  a  noticeable  distinction  in  MD  values  between 
the  normal  and  abnormal  groups  (as  seen  in  the  top  two  charts),  and  all  variables  appear  to  add 
more  noise  than  signal  (larger  S/N  ratios  are  better,  thus,  smaller  negative  S/N  ratios  are 
better). 


Test  Group  MD  values 
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Figure  6:  Fault  F  16  variable  MS  showing  MD  vaiues  from  the  normal  group  (top),  full  abnormal  group 
(middle)  and  the  S/N  ratio  dB  values  (bottom) 

This  MS  was  also  compared  against  the  single  abnormal  run  as  previously  tested  (and  selected 
to  align  with  TARDEC’s  independent  analysis).  Figure  7  shows  the  same  MS  constructed 
against  the  single  test  file.  In  this  case,  notice  there  are  two  variables  with  a  positive  S/N  ratio. 
This  indicates  that  these  two  variables  are  most  useful  in  characterizing  these  abnormal  values 
as  abnormal.  In  other  words,  it  is  a  measure  of  the  effectiveness  in  discrimination  capabilities  of 
each  variable.  Using  this  S/N  ratio  information  in  isolation  might  have  caused  inclusion  of  these 
variables  within  the  refinement  of  the  Mahalanobis  Space.  However,  consulting  Figure  6  will 
show  this  to  be  a  bad  idea.  Notice  the  single-file  fault  space  is  favoring  variables  6  and  7 
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(slightly  positive  and  slightly  negative,  respectively).  Comparing  to  the  S/N  ratios  over  the  larger 
fault  space  we  wish  to  detect  (Figure  6)  shows  that  these  two  variables  do  not  add  nearly  as 
much  signal  across  the  other  files  where  the  vehicle  is  operating  within  the  fault  condition. 


Test  Group  MD  values 


Figure  7:  Fault  F  16  variable  MS  showing  MD  values  from  the  normal  group  (top),  single  abnormal  file  (middle) 
and  the  S/N  ratio  dB  values  (bottom) 

This  first  analysis  shows  why  the  initial  work  effort  on  fault  F  may  have  been  flawed  by  being  too 
narrow  in  scope.  MTS  does  fairly  well  at  discerning  normal  from  abnormal  -  but  with  the 
stochastic  nature  of  vehicle  use,  we  may  desire  to  build  in  more  generality  to  account  for  the 
“instability”  of  the  normal  data.  In  addition  to  considering  the  larger  data  sets,  the  multiple 
threshold  investigation  was  applied  to  fault  F.  It  was  mentioned  above,  the  more  outlier  removal 
which  is  performed  prior  to  construction  of  the  MS,  the  better  detection  achieved.  However,  the 
caveat  is  increased  discrimination  in  detecting  the  abnormal  data  also  increases  the  false 
positive  detection  within  the  removed  normal  data.  This  is  often  at  the  extreme  cases  within  the 
threshold-removal  process.  For  instance.  Figure  8  shows  the  Mahalanobis  Distance  values  for 
the  normal  data  against  the  Mahalanobis  Spaces  constructed  with  MD  thresholds  of  2  and  1 .5. 

To  understand  the  threshold  process  begins  with  understanding  the  Mahalanobis  Space.  The 
statistical  nature  of  MDs  produces  a  unit  Mahalanobis  Space  from  the  normal  data.  This  means 
the  average  MD  value  across  the  normal  data  will  be  1 .  The  validation  of  the  MS  is  essentially 
ensuring  the  MD  values  of  the  abnormal  set  are  significantly  distinguishable  from  the  normal  MD 
values.  For  automating  the  outlier  removal  process,  an  intermediate  program  was  constructed  to 
iteratively  filter  out  any  data  instances  where  the  MD  value  was  above  the  threshold  value.  This 
could  take  in  any  initial  “normal”  group  and  tune  it  down  to  remove  outliers  from  construction  of 
the  MS.  The  higher  the  threshold,  the  fewer  data  removed.  The  lower  the  threshold,  the  more 
data  removed  and  the  higher  the  chance  for  reducing  false  negatives  and  increasing  false 
positives. 
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Figure  8:  Fault  F  16  variable  MS  plotting  the  entire  Normal  group’s  MDs  calculated  from  Mahalanobis  Spaces 
constructed  using  a  threshold  at  2  (above)  and  1.5  (below) 


Again,  Figure  8  shows  graphically  the  potential  increase  in  false  positives  with  markedly 
increased  MD  values  when  using  the  1 .5  threshold  group  to  create  the  Mahalanobis  Space. 
However,  even  the  threshold  at  2  shows  at  least  two  major  areas  prone  to  false  positives.  Table 
1  shows  the  threshold  MD  values  along  with  the  percentage  of  data  removed  with  that 
threshold. 


Table  1 :  Thresholds  and  data  removed  from  MS  construction 


Threshold  MD  Value 

Normal  Data  Removed 
From  MS  Construction 

10 

0.72% 

9 

0.76% 

8 

0.81% 

7 

0.94% 

6 

1.11% 

5 

1 .43% 

4 

1.91% 

3 

12.1% 

2 

1 5.4% 

1.5 

54.6% 

It  is  not  surprising  to  see  that  such  an  increase  in  removal  between  thresholds  of  2  and  1.5 
results  in  such  an  increase  in  MD  values  in  the  resulting  Mahalanobis  Space.  Additionally, 
consulting  the  same  plots  for  thresholds  of  6,  5,  and  4  shows  the  tipping  point  of  introducing 
additional  false  positives  as  shown  in  Figure  9. 
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Figure  9:  Fault  F  16  variable  MS  plotting  the  entire  Normal  group’s  MDs  calculated  from  Mahalanobis  Spaces 
constructed  using  a  threshold  at  6  (above),  5  (middle)  and  4  (below) 


In  addition  to  observing  the  impact  on  potential  false  positives,  the  S/N  ratios  were  analyzed 
using  the  threshold-created  spaces  against  the  targeted  test  data  (complete  set).  As  the 
thresholds  increased,  the  MS  data  changed  and  therefore  the  S/N  profiles  changed  as  well.  This 
was  used  to  determine  trends,  indicating  which  variables  may  be  becoming  more  or  less  useful 
in  identifying  the  abnormal  data.  Examples  of  the  16-variable  data  with  the  same  thresholds  of 
6,  5,  and  4  are  shown  as  Figure  1 0,  Figure  1 1 ,  and  Figure  1 2. 

From  this  analysis,  the  highest  noise  (lowest  S/N  ratio)  variables  were  removed,  leaving  any 
variables  which  seemed  might  be  useful  even  though  the  S/N  ratio  was  still  negative.  The  result 
was  a  reduction  from  16  variables  to  10  variables.  The  initial  impression  was  an  entire  set  of  8 
variables  exhibiting  similar  characteristics  would  be  ripe  for  removal.  Interestingly,  following  the 
threshold-investigating,  two  seemed  to  hold  enough  potential  to  remain  in  additional  analyses. 
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Figure  10:  Normal  MD  values,  Test  MD  values,  and  S/N  dB  ratio  from  MS  with  threshold  at  6 
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Figure  11 :  Normal  MD  values,  Test  MD  values,  and  S/N  dB  ratio  from  MS  with  threshold  at  5 
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Figure  12:  Normal  MD  values,  Test  MD  values,  and  S/N  dB  ratio  from  MS  with  threshold  at  4 
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From  the  10-variable  data,  the  same  threshold  analysis  was  applied  comparing  the  S/N  profiles 
for  the  resultant  (entire,  original)  norm  and  (complete)  test  groups  across  Mahalanobis  Spaces 
constructed  from  various  thresholds.  Table  2  shows  the  equivalent  data  removal  within  this 
space,  following  similar  trends  to  the  16-variable  analysis. 

Table  2:  Thresholds  and  data  removed  from  MS  construction  of  the  10-variable  set 


Threshold  MD  Value 

Normal  Data  Removed 
From  MS  Construction 

10 

0.64% 

9 

0.73% 

8 

0.86% 

7 

1.13% 

6 

2.23% 

5 

1 1 .9% 

4 

1 2.0% 

3 

1 2.6% 

2 

28.1% 

The  10  variables  were  again  reduced  down  to  the  most  significant  7  and  analyzed  with  the 
threshold  approach.  The  removal  results  are  overviewed  in  Table  3 

Table  3:  Thresholds  and  data  removed  from  MS  construction  of  the  7-variable  set 


Threshold  MD  Value 

Normal  Data  Removed 
From  MS  Construction 

10 

0.76% 

9 

0.89% 

8 

1.16% 

7 

1 1 .4% 

6 

1 1 .5% 

5 

1 1 .5% 

4 

1 1 .9% 

3 

14.8% 

2 

33.6% 

Lessons  Learned 

The  current  analysis  of  fault  F  appears  to  indicate  that  we  should  be  able  to  achieve  good 
detection  with  reduced  false  positives  with  either  the  16  variable  or  7  variable  sets.  These  two 
configurations  are  shown  in  Figure  13  and  Figure  14,  respectively.  It  appears  perhaps  the 
robustness  of  detection  comes  in  the  spreading  of  the  S/N  ratios  across  a  wide  breadth  of 
variables  as  opposed  to  the  traditional  approach  focusing  heavily  on  variable  reduction. 

One  of  the  problems  hinted  at  with  the  post-launch  vehicle  data  is  the  same  variables  are  not 
always  available  across  vehicles.  This  complicates  applying  the  MTS  process  as  missing 
variables  are  not  traditionally  encountered  within  MTS.  If  a  method  can  be  implemented  for 
handling  missing  data,  having  a  larger  set  of  attributes  to  draw  on  for  the  MD  values  may  be  of 
additional  benefit.  Continuing  research  will  explore  the  tradeoffs  with  reducing  the  variable  set 
versus  increasing  the  robustness  of  the  fault  detection  balancing  the  reduction  of  false 
negatives  with  the  increase  of  false  positives. 
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Figure  13: 16  Variable  MS  Threshold  5,  MD  values  of  the  threshold  space,  full  test 
space,  and  full  normal  space 
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Figure  14:  7  Variable  MS  Threshold  8,  MD  values  of  the  threshold  space,  full  test  space,  and  full 
normal  space 


UNCLASSIFIED 


16 


UNCLASSIFIED 


Acknowledgment 

The  authors  would  like  to  acknowledge  the  support  of  the  US  Army  Tank-Automotive  Research 

Development  &  Engineering  Center  (Contract  No:  W56HZV-12-C-0004). 

References 

[1]  Staff  Sgt.  Jason  Kendrick,  “Texas  Unit  at  Home  in  the  Iraq  Wilderness,” 
http://www.arng.army.mil/News/Pages/TexasUnitatHomeinthelraqWilderness.aspx.”  Army 
National  Guard,  01-Jul-2009. 

[2]  Defense  Update,  “Global  Combat  Support  System-Army  (GCSS-Army),  http://defense- 
update.com/products/g/gcss.htm.” . 

[3]  U.S.  Army,  “LOGISTICS  MODERNIZATION  PROGRAM  (LMP), 
https://www.po.lmp.army.mil/_site/index.html.” . 

[4]  J.  Lasher,  “LOGSA  TOOLS  Logistics  Information  Warehouse  (LIW), 
http://www.gardenstatesole.org/events/symposium06/lasher06.pdf.” . 

[5]  J.  Hong,  E.  A.  Cudney,  G.  Taguchi,  R.  Jugulum,  K.  Paryani,  and  K.  M.  Ragsdell,  “A 
comparison  study  of  Mahalanobis-Taguchi  system  and  neural  network  for  multivariate 
pattern  recognition,”  in  ASME  2005  International  Mechanical  Engineering  Congress  and 
Exposition  (IMECE2005),  November  5-  11, 2005  ,  Orlando,  Florida,  USA,  2005,  pp.  109- 
115. 

[6]  G.  Taguchi  and  R.  Jugulum,  The  Mahalanobis-Taguchi  Strategy:  A  Pattern  Technology 
System,  1st  ed.  Wiley,  2002. 

[7]  G.  Taguchi,  S.  Chowdhury,  and  Y.  Wu,  The  Mahalanobis-Taguchi  System,  1st  ed.  McGraw- 
Hill  Professional,  2000. 

[8]  D.  Mohan,  C.  Saygin,  and  J.  Sarangapani,  “Real-time  detection  of  grip  length  deviation 
during  pull-type  fastening:  a  Mahalanobis-Taguchi  System  (MTS)-based  approach,”  IntJ 
Adv  Manuf  Technoi,  vol.  39,  no.  9-10,  pp.  995-1008,  Nov.  2007. 

[9]  L.  Nie,  M.  H.  Azarian,  M.  Keimasi,  and  M.  Pecht,  “Prognostics  of  ceramic  capacitor 
temperature-humidity-bias  reliability  using  Mahalanobis  distance  analysis,”  Circuit  World, 
vol.  33,  no.  3,  pp.  21-28,  2007. 

[lOjSoylemezoglu,  Ahmet,  “Sensor  Data-Based  Decision  Making,”  Missouri  University  of 
Science  and  Technology,  dissertation,  2010. 


UNCLASSIFIED 


17 


